On Analyzing Web Log Data: A Parallel Sequence Mining Algorithm

نویسنده

  • Ayhan Demiriz
چکیده

Activities at enterprise-class web sites, as well as other web sites, are usually recorded via web logs. Collected logs consist of records from many click streams, which are defined as collections of hits (requests) from a specific user during a specific session. Using web logs is the most common way of collecting click stream data at this time. Thus data warehouses are built based on the crucial data extracted from web logs. This article proposes a parallel sequence mining algorithm, webSPADE, to analyze the click streams found in site web logs. In this process, raw web logs are first cleaned and inserted into a data warehouse. The click streams are then mined by webSPADE, the proposed algorithm, which uses one full scan and several partial scans of the data. An innovative web-based front-end is used for visualizing and querying the sequence mining results. By utilizing relational database technology, this analysis technique enables the analysis of very large amounts of data in a short amount of time. D R A F T November 18, 2003, 11:25am D R A F T

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

webSPADE: A Parallel Sequence Mining Algorithm to Analyze Web Log Data

Enterprise-class web sites receive a large amount of traffic, from both registered and anonymous users. Data warehouses are built to store and help analyze the click streams within this traffic to provide companies with valuable insights into the behavior of their customers. This article proposes a parallel sequence mining algorithm, webSPADE, to analyze the click streams found in site web logs...

متن کامل

Web Log Mining Based-on Improved Double-Points Crossover Genetic Algorithm

Web log files have become important data source for discoveries of user behaviors. Analyzing web log files is one of the significant research fields of web mining. This paper proposes an improved double-points crossover genetic algorithm for mining user access patterns from web log files. Our work contains three different components. First, we design a coding rule according to pre-processed web...

متن کامل

webSPADE: A Parallel Sequence Mining Algorithm to Analyze the Web Log Data

Enterprise-class web sites receive a large amount of traffic, both from registered and anonymous users. This traffic consists of many click streams, which are defined as collections of hits (requests) from a specific user during a specific session. Data warehouses are built to store and help analyze these click streams to provide companies with valuable insights into their customers behaviors. ...

متن کامل

Sequential Pattern Mining from Web Log Data

Sequential Pattern Mining involves applying data mining methods to large web data repositories to extract usage patterns. The growing popularity of the World Wide Web, many websites typically experience thousands of visitors every day. Analysis of who browsed what, can give important insight into the buying pattern of existing customers. Correct and timely decisions made based on this knowledge...

متن کامل

Hybrid Model for Preprocessing and Clustering of Web Server Log

With increased rate in the usage of the World Wide Web (www) is growing both in its complexity and the volume of traffic of web site, it has become very important to analyze this web traffic and the usage of the web site by the users. Web usage mining is a main research area in web mining focused on learning about web users and their interaction with web sites. The information like server log, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003